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"Quantization method and system, for instance for 
video MPEG applications, and computer program product 
therefor" 



Field of the invention 

The present invention relates to techniques for 
encoding/ transcoding digital video sequences. With the 
advent of new media, video compression is increasingly 
being applied. In a video broadcast environment, a 
variety of channels and supports exist, associated to a 
variety of standard for content encoding and decoding. 

Of all the standards available, MPEG (a well known 
acronym for Moving Pictures Experts Group) is nowadays 
adopted worldwide for quite different applications. 

An example is the transmission of video signals both 
for standard television (SDTV) and high definition 
television (HDTV) . HDTV demands bit rates up to 40 
Mbit/s): MPEG is thus widely used for Set-Top-Box and 
DVD applications. 
20 Another example is the transmission over an error 

prone channel with a very low bit rate (down to 64 
Kbit/s) like the Internet and third generation wireless 
communications terminals. 

One of the basic blocks of an encoding scheme such 
as MPEG is the quantizer: this is a key block in the 
entire encoding scheme because the quantizer is where 
the original information is partially lost, as a result 
of spatial redundancy being removed from the images. 
The quantizer also introduces the so called 
30 "quantization error", which must be minimized, 
especially when a re -quantization step takes place as 
is the case i.a. when a compressed stream is to be re- 
encoded for a different platform, channel, storage, 
etc . 
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Another important block, common to both encoding and 
transcoding systems, is the rate control: this block is 
responsible for checking the real output bit -rate 
generated, and correspondingly adjust the quantization 
level to meet the output bitrate requirements as 
needed . 

Descript ion of the related art. 

The MPEG video standard is based on a video 
compression procedure that exploits the high degree of 
spatial and temporal correlation existing in natural 
video sequences. 

As shown in the block diagram of figure 1, an i nput 
video sequence is subject to frame reorder at 10 and 
then fed to a motion estimation block 12 associated 
with an anchor frames buffer 14. Hybrid DPCM/DCT coding 
removes temporal redundancy using inter-frame motion 
estimation. The residual error images generated at 16 

?nL fUrth6r PrOCSSSed V±a a ^i-rete Cosine Transform 
(DCT) at 18, which reduces spatial- redundancy by de- 
correlating the pixels within a block and concentrating 
the energy of the block into a few low order 
coefficients. Finally, scalar quantization (Quant) 
performed at 20 and variable length coding (VLC) 
carried out at 22 produce a bitstream with good 
25 statistical compression efficiency. 

Due to the intrinsic structure of MPEG, the final 
bit-stream is produced at a variable and unconstrained 
bitrate; hence, in order to control it ...or when the 

30 TIT Chamiel reqU±reS * Constant titrate, an output 
30 buffer 24 and a feedback bitrate controller block 26 

which defines the granularity of scalar quantization,' 

must be added. 

In the block diagram of figure 1, reference number 
28 designates a multiplexer adapted for feeding the 
35 buffer 24 with either the VLC coded signals or signals 
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derived from the motion estimation block 12, while 
references 30, 32, and 39 designate an inverse 
quantizer, an inverse DCT (IDCT) module and a summation 
node included in the loop encoder to feed the anchor 
frames buffer 14. 

All of the foregoing is well known to those of skill 
in the art, thus making a more detailed explanation 
unnecessary under the circumstances. 

The MPEG standard defines the syntax and semantics 
of the output bit -stream OS and the functionality of 
the decoder. However, the encoder is not strictly 
standardized: any encoder that produces a valid MPEG 
bitstream is acceptable. 

Motion estimation is used to evaluate similarities 
among successive pictures, in order to remove temporal 
redundancy, i.e. to transmit only the difference among 
successive pictures. In particular, block matching 
motion Estimation (BM-ME) is a common way of extracting 
the existing similarities among pictures and is the 
technique selected by the MPEG-2 standard. 

Recently, adapting the multimedia content to the 
client devices is becoming more and more important, and 
this expands the range of transformations to be 
effected on the media objects. 

General access to multimedia contents can be 
provided in two basic ways. 

The former is storing, managing, selecting, and 
delivering different versions of the media objects 
(images, video, audio, graphics and text) that comprise 
the multimedia presentations. 

The letter is manipulating the media objects "on the 
fly", by using, for example, methods for text-to-speech 
translation, image and video transcoding, media 
conversion , and summarization . 
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Multimedia content delivery thus can be adapted to 
the wide diversity of client device capabilities in 
communication, processing storage and display. 

in either basic ways considered in the foregoing, 
5 the need for converting a compressed signal into 
another compressed signal format occurs. A device that 
performs such an operation is called a transcoder. Such 
a device could be placed in a network to help relaying 
transmissions between different bit rates or could be 
10 used as a pre-processing tool to create various 
versions of the media objects possibly needed as 
mentioned in the foregoing. 

For example, a DVD movie MPEG-2 encoded at 8 Mbit/s 
at standard definition (Main Profile at Main Level) may 
be selected by a user wishing to watch it using a 
portable wireless device assisted by a CIF display. To 
permit this, the movie must be MPEG^2 decoded, the 
picture resolution changed from standard definition to 
CIF and then MPEG-4 encoded. The resulting bit stream 
20 at, i.e., 64 Kbit/s is thus adapted to be transmitted 
over a limited bandwidth error-prone channel, received 
by, the portable device and MPEG-4 decoded for related 
display. The issue is therefore to cleverly adapt the 
bitrate and the picture resolution of a compressed data 
25 stream compliant to a certain video standard (e.g. 
MPEG-2) to another one (e.g. MPEG-4) . 

A widely adopted procedure is to decode the incoming 
bxtstream, optionally to down-sample the decoded images 
to generate a sequence with a reduced picture size, and 
30 then re-encode the sequence with a new encoder 
configured to achieve the required bitrate. 

Alternative methods have been developed as 
witnessed, e.g. by EP-A-1 231 793, EP-A-1 231 794 or 
European patent application No. 01830589.6. These and 
similar systems are adapted to work directly in the DCT 
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domain, incorporating the decoder and the encoder, and 
re-utilizing useful information available, (like motion 
vectors, for example) . 

These systems are adapted to remove unnecessary 
redundancies present in the system. In any case, a de- 
quantization followed by a re-quantization step (called 
"requantizer") is usually required together with an 
output rate control function. 

Theory of quantization processes 

In order to better understand the background of the 
invention, the inherent drawbacks and problems of the 
related art as well as the solution provided by the 
invention, a general mathematical description of 
quantization processes will be presented, followed by a 
cursory description of possible- applications in video 
compression and transcoding techniques. 

Given a number x, quantization can be described as 
follows: 

y=y* ifxei k 

where y k is the quantized value of x and all l k are 
ranges like 

I k =x k <x<x M * = 1,2,...,Z 

After that, the group of ranges and the values to be 
associated to each one of them will be defined. 
Starting from the definition of "quantization error" as 
follows : 

of all the quantization step groups, the optimal one 
(Iopt) minimizes the average quantization error e q : 
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where p(x) is the probability distribution of the 
independent variable x. 

Considering the range [x k x k+1 ] , and y k = y k+d , the 
quantization error in the range can be calculated as 
follow: 



■t - +l e,(z) - pi x)dx = \ Xk + d-z\ .p(a 



x)dx 



In this way, the quantization error in each 
quantization ranges depends on the distance d of y k 
from its left extremity x k . Because the goal is to 
mmimize this error, the zeros of the first derivative 
have to be located as a function of d. 
15 in other words 
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a-* jT 

= Jim /(g)d3: - ..-^ ^fr* + d + h) - F(x t + d) 



^ A jfe 



In the same way, 



A-rt> /» A— fO " = 



It is now possible to calculate the derivative of 
the error with respect to d: 

§a e <=£ j^ 1 P(*)&+(**+ < o-[p(»*+d)+p{**+«o]+ 

-[(** + 4)'- pC*t + <0 + (** +d) pfo + d)j = 

p(x)dz- I p(x)dx = 0 
=>Vf.l p(x)dx= / ^xjdr 
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Therefore, the point of minimum error corresponds 
with the median of the range. 

in the same way, it is possible to demonstrate 
5 that, starting from a range [X k , xk +1 J , the best 
subdivision in two different intervals 

U*.*JUt* yt * w J with x k <x,Z Xt 

_ is the one that leads to equality of the two following 
10 functions in the two sub-ranges: 

Xj'}p(x)dx= jp(x)dx 

From this, lopt represents all the ranges with 
equal probability, univocally defined by L 

"Quantization" , in the video compression context 
requires that each 16 bit coefficient (with sign) from 
the DCT transform of the prediction error is associated 
to a sub-set of discrete numbers, smaller than the 
original one, reducing, in this way, the spatial 
redundancy of the signal. 

Quantization of the DCT coefficients plays a key 
role in compression processes (this being true not just 
for the video context), since the final nitrate depends 
very strictly on this stage of the process 
Specifically, the DCT transformation concentrates the 
energy associated to the input signal (e.g. the images 
of a v ld eo sequence) into small number of coefficients 
which represent the lowest spatial frequencies. However 
the. DCT transformation does not reduce the amount of 
data needed to represent the information. This means 
that, by applying a coarse quantization on these 
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coefficients, a large number of zero- coefficients can 
be removed from the high frequency region of each 
macroblock (where the human eye is less sensitive) , 
thus achieving a true reduction of information. 

This is shown by way of example in figure 2, which 
represents an example of DCT coefficient quantization. 

This is the only one step that is not reversible 
in the compression chain (i.e. the relevant information 
is not transformed but at least partly lost) . 

In the Intra-Coded macroblocks, briefly "Intra", 
belonging to the Intra-Coded frames ("I") or to the 
Predicted frames ("P" or H B W ) the DC component of each 
macroblock (the first coefficient in the upper left 
corner) and the AC components (all the other 
coefficients) are quantized separately, using the 
following rules: 




where C(u,v) are the quantized coefficients, F(u,v) are 
the DCT coefficients, Q(u,v) is the quantization step, 
Q F is a quantization parameter and the sign is the sign 
of F (u , v) . 
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The inverse quantization is obtained from 
following rules: 

W) = 8C(0,0) 

F(u,v) = C (»» v )g(»>v)^ 
8 



For those macroblocks which are predicted 
interpolated, belonging thus to Predicted 
Bidirectionslly Predicted frames (briefly »P» or 
frames), the quantization process is the following: 



A(u,v) = 



16 /X„,v)±.gfr' v > 
2 



C(u,v) = \ 



Q(u,v) 



4^ odd 
2Q F ^ F 

Hf even 



and the sign used is the sign of A(u,v) . 
The inverse quantization is obtained as foil 



OWS : 



F(«,v) = ( 2 n*,v) + l). gF .cytt.vfl 
16 



11 



The rate control algorithm calculates the Q F 
parameter, which represents the real quantization 
level . 

To sum up, the quantization step is where the 
5 compression process becomes lossy, in the sense that 
the errors introduced are no longer recoverable. The 
total error depends on the spatial position of each 
coefficient in the block that contains it, and from the 
number of bits already spent from the beginning of the 

10 picture until the current macroblock (because the Q F 
parameter can be changed for each macroblock) . 

The minimum possible error is zero, when the 
quantizing coefficient is a multiple of the 
quantization step; the maximum possible error is equal 

15 to half the quantization step that contains the 
quantizing coefficient (referring to a non linear 
quantization scale) . This means that if .quantization is 
too . "hard" (the Q F parameter having a high value) the 
resulting image will be appreciably degraded and the 

20 block artifacts visible. On the other hand, if the 
quantization is too u sof t" , the resulting images will 
be significantly more detailed, but a higher number of 
bits will be required to encode them. 

In the MPEG-2 standard, the DCT coefficients 

25 integer range of variability is [-2048, 2047] : the 
total number of quantization intervals L, depending on 
mQuant (the quantization level parameter, calculated by 
the rate control algorithm) is: 

30 

j. 4096 
mQuant 

For the Inter, macroblocks, it is not generally 
possible to find a probability distribution of the 
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coefficients (coding the prediction error), m fact 
this depends on the input signal and the motion 
estimator characteristics. Recently, it has been 
demonstrated that it is possible to approximate a 
Laplacian distribution also for this kind of OCT 
coefficients, but the variability of its parameters are 
much bigger than for the Intra case. For this reason a 
uniform distribution is currently assumed. The original 
coefficient is divided by the value mQuant, while 
moving toward the nearest integer. 

For the intra macroblocks, the probability 
distribution of the OCT coefficients (excluding the DC 
coefficient) can be very well approximated by a 
Laplacian curve, centered on the zero value. 

Referring, by way of example, to the first 100 
frames of the standard sequence known as Mobile & 
Calendar, the distribution of the corresponding AC-DCT 
coefficients may be well approximated by a Laplacian 
curve with parameter A =0.055. The parameter X can be 
very easily found, considering the Laplacian curve 
equation: 



Calculating experimentally the variance of the AC 

coefficients a. the be>Qt- t ar-.i 

' Lne Desc Laplacian curve fitting the 

given points can be found as follow. 



mm 

v 2 = l{x-E(x)y.p( x)dx = 
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J~oo * 

2 U 2 W 
_ J2_ 

~ A 2 
cr 



Theoretically speaking, because a coefficient is 
sought to be quantized with quantization parameter 

mQuant, one must find all the -^-intervals with the 

mQuant 

same probability, and, for each one of them, the median 
value, the true goal being minimizing not the absolute 
quantization error, but rather its average value. 
Moreover, using for each interval the median value is 
important also for the subsequent VLC compression 
(shorter words will be associated with more frequent 
values) : this increases the maximum quantization error. 
AS this is not a probable event, better compression 
with a minimized mean square error is allowed. 

For practical implementations, it is in any case 
preferable to simplify the quantizer, using again the 
one used for the Inter case. To do that, it is 
necessary to apply some modifications to the input 
coefficients, to adapt them to the different 
probability curve. In the Test Model Five (TM5) , all 
the AC coefficient are pre-quantized using a matrix of 
fixed coefficients that eliminates all the frequency 
that are not perceptible; after that, adaptive 
quantization is applied, proportional to the parameter 
mQuant needed . 
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Analyzing the function, each AC-DCT coefficient is 
quantized following this expression: 



QAC = Z~W + j- • mquant - mquant 

2 • mquant * 



2 • mquant $ ~ 8 



5 This means that to each quantization interval (S) 

will be associated a value which does not represent the 
mean value, but the mean value decremented by 1/8 This 
confirms that, since the probability distribution is 
not uniform in each interval (but can be approximated 
10 by a Laplacian curve) the most representative value of 
the interval itself is the median, which also minimizes 
the quantization error). 

As already indicated, MPEG2 standard defines 
syntax and semantics of the transmitted bitstream and 
the functionalities of the decoder. However, the 
encoder is not strictly standardized: any encoder that 
produces a valid MPEG2 bitstream is acceptable The 
standard puts no constraints on important processing 
steps such as motion estimation, adaptive scalar 
quantization, and bit rate control. 

This last issue plays a fundamental role in actual 
systems working at Constant Bit Rate (briefly CBR) Due 
to the intrinsic structure of MPEG2 , the final 
bitstream is produced at variable bit rate, hence it 
25 has to be transformed to constant bit rate by the 
msertion of an output buffer which acts as feedback 
controller. The buffer controller aims at achieving a 
target bit rate with consistent visual quality it 
monitors the amount of bits produced at a macroblock- 
30 by-macroblock level and dynamically adjusts the 
quantization parameters for the subsequent ones 



15 



according to its fullness status and to the image 
complexity. 

Bit rate control is a central problem in designing 
moving pictures compression systems. It is essential to 
ensure that the number of bits used for a group of 
pictures (GOP> is as close as possible to a 
predetermined one. This is especially relevant in 
magnetic recording, and more in general, in those 
applications where strong constraints exist on 
instantaneous bitrate. In fact, in order to realize 
playback u trick" modes, such as "fast forward", it is. 
necessary to start I -pictures at regularly spaced 
positions on the tape. In this kind of reproduction 
only the Intra pictures can be visualized: they allow a 
random access to the sequence since they are coded 
independently. Search is performed with a jump close to 
the GOP (Group Of Pictures) start code and then with a 
read step in the bitstream until the image starts. 
Hence, only the first image of the GOP is to be 
decoded. 

A constant bit rate per GOP is also an 
advantageous solution in the case of bitstream editing. 
It makes it possible to take a small part of the 
sequence, modify, re -encode and put it exactly where it 
was in the bitstream. Bit rate control algorithms based 
on pre-analysis can produce output bit rates that are 
very close to the desired one. They use information 
from a pre-analysis of the current picture, where such 
pre-analysis is a complete encoding of the image with a 
constant quantizer. Since the current picture is 
analyzed and then quantized, scene changes have no 
influence on the reliability of the pre-analysis. 

A procedure for controlling the bit -rate of the 
Test Model by adapting the macroblock quantization 
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parameter is known, as the Test Model 5 (TM5> rate 
control algorithm. 

The algorithm works in three steps: 

i) Target bit allocation: this step estimates the 
number of bits available to code the next picture, it 
is performed before coding the picture. 

ii) Rate control: this step sets by means of a 
"virtual buffer- the reference value of the 
quantization parameter for each macroblock. 

iii) Adaptive quantization: this step modulates 
the reference value of the quantization parameter 
according to the spatial activity in the macroblock to 
derive the value of the quantization parameter, mquant 
which is used to quantize the macroblock. 

A first phase in the bit allocation step is 
complexity estimation. After a picture of a certain 
type (I, P, or B) is encoded, the respective "global 
complexity measure" (Xi, Xp, or Xb) is updated as: 

Xi = Si Qi, Xp = Sp Qp, Xb = Sb Qb 

where Si, Sp, Sb are the numbers of bits generated by 
encoding this picture and Qi, Qp and Qb are the average 
quantization parameter computed by averaging the actual 
quantization values used during the encoding of the all 
the macroblocks, including the skipped macroblocks. 
The initial values are: 



Xi=l60*bit_rate/115 

Xp=60*bit_rate/115 

Xb=42*bit_rate/H5 



Where bit_rate is measured in bits/s. 
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Subsequently, in the picture target -sett ing phase, 
the target number of bits for the next picture in the 
Group of Pictures (Ti, Tp, or Tb) is computed as: 



Tj, « max 



R 



N p X p Nb Xb 
1+ + 



Tp - max 



Xj. Kp Xj Kb 
R 



NbKpXb 
KbXp 



, bit_rate / (S*pLctii«_rate)} 



Tb = max {- 



N T b + 



NpKb^> 
K p x b 



bit_rate /(8*pkture_rate)} 
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Where : 

Kp and Kb are "universal" constants dependent on 
the quantization matrices; acceptable values for these 
are Kp = 1 . 0 and Kb = 1.4. 

R is the remaining number of bits assigned to the 
Group of Pictures. R is updated as follows. 

After encoding a picture, R = R - Si,p,b where is 
Si,p,b is the number of bits generated in the picture 
just encoded (picture type is I, P or B) ; . 

Before encoding the first picture in a Group of 
Pictures (an I -picture) : 
R=G+R 

G = bit_rate * N / picture_rate 

N is the number of pictures in the Group of 

Pictures. 

At the start of the sequence R = 0. 
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Np and Nb are the number of P-pictures and B- 
pictures remaining in the current Group of Pictures in 
the encoding order. 

A subsequent step in the process is local control. 

Before encoding macroblock j (j>=l), the 
"fullness" of the appropriate virtual buffer is 
computed as : 
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. . TiO-J) 



MB_cnt 
or 

T p (j-J) 

djP = doP+ Bj.] - 

MB_cnt 

or 

Tb(H) 



dj b =do b + Bj.] 



MB_cnt 



depending on the picture type, where: 

d 0 \ d 0 p , d 0 b are initial fullnesses of virtual 
buffers - one for each picture type. 

B, is the number of bits generated by encoding all 
macroblocks in the picture up to and including j . 

MB_cnt is the number of macroblocks in the 
picture . 

dj\ dj p , dj b are the fullnesses of virtual buffers 
20 at macroblock j - one for each picture type. 

The final fullness of the virtual buffer (d 1 
djP, d>:j = MB_cnt) is used as d 0 \ d 0 p , d 0 b for encoding 
the next picture of the same type. 

Next, compute the reference quantization parameter 
25 Qj for macroblock j as follows: 
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d; * 31 

or- — 

r 

where the "reaction parameter" r is given by r = 2 * 
bit_rate / picture_rate and dj is the fullness of the 
appropriate virtual buffer. 

The initial value for the virtual; buffer fullness 

is : 

d 0 i =10*r/31 
d 0 p = Kp do 1 
d 0 b = Kb d 0 i 

A third step in the process is adaptive 
quantization. 

A spatial activity measure for the marcroblock j 
is computed from the four luminance frame -organised 
sub-blocks and the four luminance field-organised sub- 
blocks using the infra (i.e. original) pixel values: 

act.- = 1 + min {var_d>Ik) 
where 

1 64 

var_sb]k = — Simi<Pk-P_mean)2 

I 64 
P_mean = — SUM Pjc 
64 k=J 

and Pk are the pixel values in the original 8*8 block. 
Normalized act j : 

2 * actj+ avgjid 

N_actj = . 

actj+ 2 * avg^act 
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avg^act is the average value of actj the last picture 
to be encoded. On the first picture, avg_act = 400. 
There mquantj is obtained as: 
mquantj = Qj * N_actj 

where Qj is the reference quantization parameter 
obtained in step 2. The final value of mquantj is 
clipped to the range [1 ..31 J and is used and coded as 
described in sections 7, S and 9 in either the slice or 
macroblock layer. 

This known arrangement has a number of drawbacks. 
First of all, step 1 does not handle scene changes 
efficiently. 

Also, a wrong value of avg_act is used in step 3 
(adaptive quantization) after a scene change. 

Finally, VBV compliance is not guaranteed. 

Normally, the re -quantization process consists in 
a block of inverse quantization (IQ). followed by a 
quantization block (Q) . it is mandatory to care about 
this operation, because the quantization errors can be 
very important, and they can get worse the images. 
Optimizations to this process are possible. 

When a uniform quantizer is used (as in TM5) , it 
is possible to fuse together the two blocks in only one 
procedure, reducing both the computational costs and 
the errors related to this operation. 

Starting from the TM5 quantizer, above described, 
the Inter and Intra quantization error can be analyzed 
as follows. 

Considering a coefficient C, two quantization 
parameters A and B (with A<B) and the quantization C A 
and C B of C with respect to A and B. 
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Designating Cab the re -quantization of C A and with 
respect to B: 



The re-quantized coefficient CAB must represent C 
with the minimum error possible, with respect to a 
direct quantization by the factor B. It has been 
demonstrated that this is true directly .quantizing C 
respect to B, in other words obtaining the value C B . 

The re -quantization error is the difference 
between Cab and C B / 

It is possible to demonstrate -that : 

but also: 



** B Eab ^~T~ + Z ** 



consequently : 



\c AB -c B \ = 



9 , ±1a 



+ e 



AB 



-H+t 
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Therefore, the re -quantization error is bigger 
when the difference between the value A and B is 
smaller . 

Object an d summary of the invention 

The object of the invention is thus to provide 
alternative arrangements overcoming the drawback and 
limitations of the prior art arrangements considered in 
the foregoing. 

According to the present invention, this object is 
achieved by means of a method having the features set 
forth in the claims that follow. The invention also 
relates to a corresponding system as well as computer 
program product directly loadable in the memory of a 
digital computer and comprising software code portions 
for performing the method of the invention when the 
product is run on a computer. 

Brief d escription of the drawings - 

The invention will now be described, by way of 
example only, with reference to the annexed figures of 
drawing, wherein: 

- figures 1 and 2, concerning the related art, 
were already described in the foregoing, 

- figures 3 and 4 , with figure 3 including two 
portions designed a) and b) , respectively, shows a 
uniform quantization arrangement and the corresponding 
error, 

- figure 5 shows an arrangement for uniform 
quantization using subtractive dithering, 

figure 6 shows an arrangement for uniform 
quantization using non- subs tractive dithering, 

- figure 7 is a block diagram of a dithered re- 
quantizer, 

- figure 8 is a block diagram of a downsampling 
transcoder, 
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- figure 9 is a three-dimensional diagram showing 
the relationship of output bit rate to input bit rate in 
an arrangement disclosed herein, and 

figure 10 shows a basic quality evaluation 
5 scheme for use in the context of the invention. 

Dithered quantization is a technique where a 
particular noisy signal , called di the:r, is summed to 
the input signal, before the quantization step, this 
step being usually carried out as a uniform 
10 quantization step. 

As described before, a uniform quantizer 
implements a correspondence between an analog signal 
(continuous) and a digital signal (discrete) , formed by 
the collection of levels with the same probability. 
15 In the case of MPEG-2 signals, the input process 

can be considered as a stationary process X n with neZ 
where Z represents the real numbers. 

As shown in figure 3a, the output of a quantizer 
block q fed with an input signal X n is the process 
20 X n =q(X n ) . Figure 3b shows both the typical relationship 
of q(X n ) to X n and the quantization error e n . 

In a uniform quantizer, the hypothesis is that the 
quantization error is equal to e n =q(X n )-X n . For this 
reason, the difference between input and output is a 
25 sequence of random variables, following a uniform 
distribution, uncorrelated between them and with the 
input . 

In this case, one can model the quantizer block 
q(X) as in figure 4 where e n , is a sequence of uniform 
30 random variables, independent and all distributed in 
the same way, 

This approximation can be acceptable, inasmuch as 
the number N of quantization levels is high: this 
condition corresponds to a small quantization step A 
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and the probability function of the input signal is 
smoothed (Bennet approximation) . 

Using a dithering signal as an input practically 
corresponds to forcing this condition even if not 
exactly met. 

Two different types of dithering are available: 
subtractive and non- subtract ive . 

In the former case, as shown in figure 5, a random 
(or pseudo- random) noise signal is added to the input 
before quantization, U„=X n + w n , and is subtracted after 
the inverse quantization block, in order to reconstruct 
the input signal, removing the artifacts due to the non 
linear characteristic of the quantizer. 

When non- subtract ive dithering is used as shown in 
figure 6, the input signal of the quantizer is the 
same, but no correction is applied to the inverse 
quantized signal. 

The introduction of such kind of error modifies 
the quantization error definition as follow: 
e„=q(X n +W n ) - (X n +W n ) 

Therefore, the genera] difference between the 
original input and the final output (the quantization 
error) will be : 

e„=q(X-n+W n ) -X n =e n +W n 

Between the two types of dithering strategies, 
using the non -subtract ive scheme is preferable for a 
number of reasons . 

First of all, even though having several 
advantages, subtractive dithering is difficult to 
implement in a real system, because the receiver needs 
to be very tightly synchronized with the transmitter, 
and this is not the case . 



25 



Moreover, transmitting the generated random 
sequence together with the sequence also is hardly 
acceptable, as this will occupy a lot of space in the 
compressed stream, and this only to transmit noise. 
5 Secondly, subtractive dithering implies high 

arithmetic precision (so a large number of bits) , but 
generally, integer variables are used. 

Several other factors need be considered when 
using a dithered approach for transcoding. 
10 A first factor is the target bitrate: data 

compression is obtained using an efficient VLC of the 
quantized DCT coefficients after the Run-Length coding. 
Analyzing re-quantization and the effects deriving from 
dithering, shows that applying this technique to all 
15 the DCT coefficients may not be advantageous. 

This is because in the high frequency part of the 
DCT coefficients matrix, several zero coefficients will 
modified to non-zero coefficients: this complicates the 
task of the subsequent VLC step, as these non-zero 
20 coefficients coefficients can no longer be compressed 
to one symbol as it would be the case for zero 
coefficients . 

For this reason, the output bit -rate will be 
higher: so, the rate controller will increase the 
25 quantization parameter mQuant, in order to follow the 
target bi-rate fixed, which would adversely affect the 
final image quality. 

The arrangement shown in figure 7 implies a double 
re -quantization cycle: for each coefficient considered, 
30 a value re-quantized with the normal procedure (i.e. 
without dither) is calculated. ? 

If the coefficient is zero, which is ascertained 
in a block downstream of the uniform quantizer ql , this 
will be directly fed to the final stream via a 
35 multiplexer module 102. 
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Otherwise, for the non-zero coefficients - and 
only fo these, the re-quantized value is calculated 
again with the dithering procedure. 

Specifically, in the block diagram of figure 7 
reference 104 indicates a summation mode (adder) where 
a dither signal is added to the AC-DCT signal upstream 
of another uniform quantizer q 2 , whose output is fed to 
the multiplexer 102. 

Quite obviously, the "parallel" arrangement shown 
in figure 7. that provides for the use of two quantizers 
ql and q2 also lends itself to be implemented as a 
time-shared arrangement using a single quantizer only. 

The type of dither noise added before the 
quantization is significant. Its characteristics must 
be such as to uncorrelate the final quantization error 
from the input of the quantizer (the dithered original 
signal) . 

Different types of noise may be used by adapting 
the characteristic function of the process that 
20 generates them: gaussian, uniform, sinusoidal and 
triangular. 

Any known procedure for pseudo-random variable 
generation with uniform distribution can be used to 
advantage in order to subsequently modify its 
25 distribution to obtain e.g. a gaussian or triangular 
distribution. 

In the case considered, a triangular distribution 
gives the best results, triangular noise being obtained 
as the sum of two independent, uniformly distributed 
30 pseudo-random variables. 

The ratio between the input and the output mQuant 

is to be taken into account, in that it is not always 

convenient to insert the noise signal before the linear 
quantization. 



27 



From another point of view, when the input and the 
output mQuant are similar (equal or multiples) , 
randomly correcting the coefficients may not be 
advantageous, so the dither is not applied in this 
condition. 

Different implementations of the output bitrate 
controller are thus possible for transcoding, with or 
without image size downsampling . 

The Constant Bit Rate (CBR) approach, rather that 
the Variable Bit Rate (VBR) , is usually preferred: CBR 
is in fact representative of the real worst case, and, 
in general, a variable bit rate control algorithm can 
be intended as a constant one where the parameters are 
relaxed. 

The transcoding process is useful for decreasing 
the bit rate of a source data, in order, typically, to 
permit the contents to be conveyed over different 
channels with different available bandwidths, without 
giving rise to a long latency due to the receding 
process . 

A rate control algorithm can be derived from the 
TM5 approach and adapted by using e.g. the same level 
of local feedback (picture level) and the same global 
target bit calculation (GOP level) . 

For the complexity calculation X*, instead, the 
need exists of distinguishing between those bits needed 
for the so-called overhead (basically the headers, the 
motion vectors, etc.) and those bits allocated for the 
DCT coefficients, which are more correlated with the 
real image complexity. 

The incoming bit-stream is already quantized using 
the visibility matrices, and the chosen quantization 
parameter "mquant" carries the information of the local 
quality of each single macroblock. From this one can 
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assume that the only one control variable is the 
quantization mquant : 

qj=mquant 

This decision is useful, in order to obtain a 
global control more stable. 

Having only one variable to be controlled, the 
dynamic range thereof is over a one-dimensional domain, 
where it is easier to work (also from the 
implementation point of view) . Moreover, the 
macroblocks activity is not recalculated and, we 
rounding error due to the visibility matrices 
multiplications and divisions can be avoided. All the 
calculations are performed in fixed point, with a 
limited dynamic. 

To stabilize the system, a preanalysis block is 
added between the global control and the local one. 

A viable arrangement is a mixed feedback and 
feedforward approach. 

Upstream of the local control loop, a preanalysis 
routine is inserted, where each single picture is 
quantized (picture-preanalysis) with an hypothetic 
value of mquant (chosen experimentally after several 
simulations) : at this point it is possible to count how 
many bits are spent in this condition, and take 
advantage from this information. The preanalysis result 
is called BUP (Bit Usage Profile) : the following final 
quantization routine can adjust the used mquant, basing 
its decisions on these values. 

Summarizing, preanalysis provides information to 
the local control routine: this is not only a 
complexity measure of each picture, but also an 
estimation between the number of bits spent for each 
DCT coefficient coding, and the bits spent for the 
overhead (header, motion vectors) , that are a 
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structural fixed payload, without changing the output 
standard. 

Locally, instead of a proportional control 8as 
this is the case of TM5) , a proportional -integrative 
(PI) control described is used, e.g.: 



where e(t) is the instantaneous error function: 
e (t) =y° (t) -y (t) . Kp is called the proportional action 
coefficient, T is the integration time (this must not 



constant Ki is the ratio between Kp and Ti, called 
integral action constant. 

The two constants K p and Ki indicate. the reactivity 
of the controller with respect to the proportional and 
integrative error. In this case, the only observable 
variable is the generated number of bits. An index 
proper does not exist that can measure the real quality 
of the coded images. So one may assume that y°(t) is a 
distribution of bits as follows: 



This type of control reduces the effect of a 
systematic error over the GOP under transcoding. For 
output bit rates higher than 4 Mbit/s, K L and K p can be 
assumed as constants. From the experiments, the mquant 
values very rarely approach the limit of the linear 
quantization "staircase" . 

In the global control level, the target bits are 
assigned for each single picture of a GOP. In the 




be confused with the target bits) and then, the 




startup id. 
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implemented rate control the assumption is made, as in 
TM5, that image complexity can be correlated with its 
predecessor of the same type I, P or B . 

The calculation of the complexity and the targets 
is performed differently from TM5 . The assumption is 
made that in current GOP there are R available bits and 
k pictures already coded so that: 



n=0 



where R x are the remaining bits (left) to be used to 
encode the following N-k pictures, if t [n] i s the 
target for the picture n of the GOP, then: 



tf-\ 



and then: 



R,=N r T t +N p T p +N B -T B 



For any picture type (i) , the target bits are the 
of the bits spent for the overhead (Oi) and the 
bits spent for the DCT coefficients (Ci) : 



sum 



Ti=Ci+Oi 



With these definitions, the image complexity Xi 
can be calculated as follows: 



Xi=Ci Qi 
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where Qi represents, the average mquant (from the 
preanalysis) and Ci, is related only to the bits spent 
for the DCT coefficients encoding. 

The proportional constants K IP and K IB can be 
determined as follows: 



IP Q t : Kt >-Q t 



The expressions for the target bit, used for the 
global control level are then derived, obtaining: 



C, = R ' X i 

_ c r x P 



B 



Kp B ' Kip * Xj 



Even if the MPEG-2 standard (Main profile @ Main 
level at standard TV resolution) allows transmissions 
with data rate up to 15 Mbit/s, the real low limit of 
its applicability range (in order to obtain always good 
image quality) is about 4 Mbit/sec : below that limit, 
the visual quality is not good enough, and different 
processing techniques need be applied. 

One possibility is to reduce the frame rate simply 
skipping some frames; another, more complex approach 
that also preserves more "global" sequence quality, is 
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to downsize each image, reducing its dimension to 1/2 
or 1/4 . 

An arrangement applying that principle is shown in 
figure 8, where references IS and OS indicate the video 
input and output sequences, respectively. 

Reference 2 00 designates the sequence GOP header 
that feeds a sequence GOP data delay memory 202, that 
in turn feeds an output multiplexer 204. 

The header 200 also feeds a picture header 206 that, 
via a multiplexer 2 08, feeds a- local cache memory 210 
adapted to cooperate with the multiplexer 204 as well 
as still another multiplexer 212. 

The multiplexer 212 receives input signals from the 
multiplexer 2 08 and the memory 210 and feeds them to a 
processing chain including a cascaded arrangement of: 

- an inverse VLC (I-VLC) block 214, • 

- an inverse RL (I-RL) block 216, 

- a low-pass filter 218, 

- a 1:2 downsampler block 220, 

- an inverse quantizer 222 followed by a quantizer 
224, 

- a RL coding block 226, 

- a VLC coding block 228, and 

- a multiplexer 230 arranged to alternatively send 
the signal from the VLC block 228 to the output 
multiplexer 204 or a picture preanalysis chain 
comprised of a bit profile usage module 232 and a rate 
control (Mquant) module 234 which in turn controls the 
quantizer 224 by adjusting the quantization step used 
therein . 

To sum up, the system shown in figure 8 includes two 
additional blocks (that can be incorporated to one) : 
the low pass filter 218 and the downsampler 220. 

Even if the syntax is the same, the output bitstream 
OS will no longer be strictly MPEG-2 compliant, because 
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macroblocks are encoded over 8 pixel width and height 
while MPEG-2 only allows 16 pixels as the macroblock 
dimensions . 

So a specific decoder working on low-resolution 
anchor frames may be required. Alternatively, by 
changing slightly the syntax of the headers and the 
output VLC block, an H.26L compliant bit-stream can be 
produced. 

H.26L is an emerging standard, expected to be 
largely used in the near future and probably to 
substitute the MPEG-4 standard in wireless 
communications, also known as H.264. 

An advantage of this technique is that the 
decoding process is performed on low- resolution images, 
largely reducing the blocking artifacts. These 
considerations are also confirmed by measuring the 
block artifact level factor with the ,GBIM technique 
(see "A generalized block-edge impairment metric for 
video coding", H.R. Wu and M.Yuen, IEEE Signal 
Processing Letters, vol. 4, No. 11, November 1997). 

At least two different implementations of the 
system can be envisaged. 

In a first embodiment, low pass filtering is 
performed before preanalysis: in this case the block 
dimensions will remain 8x8 pixels, but only the low 
frequency portion (4x4 pixels) will be not-zero. In 
this case, the result is sub-optimal, but the advantage 
is that the output bit-stream will still be MPEG-2 
compliant. 

Alternatively, together with the low-pass 
filtering, a decimation phase is executed: the blocks 
will be 4x4 pixels large, and the subsequent RL and VLC 
coding steps will be effected on this structure, 
generating a non MPEG-2 bitstream. With this approach a 
better quality can be reached. 
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The MPEG-2 video standard ; exhibits some 
limitations for low bit-rates: the most evident one is 
that the hierarchy syntax is very rigid and cannot be 
changed, according to what is really written into the 
bit-stream. 

The transcoder does not execute a complete 
recoding of the bit -stream content, but reduces the 
information carried by the DCT coefficients with a 
stronger quantization. This implies that all the 
semantic structures of the incoming bit -stream 
(headers, motion vectors, but also the macroblocks 
number) are not changed and the bits used for this part 
of the stream will be more or less copied into the 
output one (syntax overhead). 

For this reason, for very low bit -rates (under 1.5 
Mbit for a Dl incoming image format and CIF as output) , 
it is not fair to compare this approach versus a 
complete decoding- filtering -reencoding process, because 
in this last case, 1/4 of the incoming macroblocks will 
be encoded, reducing by roughly a factor 4 the named 
overhead. 

In any case, this second approach requires, in 
addition to a complete decoding of the incoming stream, 
a new motion estimation and a bigger latency with the 
output: this latter limitation could be quite 
significant e.g. in video -conferencing applications, 
where interactivity of the speakers (two or more) must 
be very strict. 

Moreover, under these conditions, the possible 
dynamics of the mquant variations are reduced, because 
the quantization parameters used are close to their 
upper limit. For that reason, any large variation with 
respect to the average mquant will be very visible, and 
the controller will must take in account also this 
problem. 
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Also, the rate control implementation can be 
different, according to the application and the data 
bandwidth available on the transmission (or storage) 
channel. For a CBR channel with low capacity (less than 
1.5 Mbit/second) and low latency a very precise rate 
control is important, accepting some block artifacts. 

The situation is different if the only constraint 
is the final dimension of the data stream (consider an 
HDD or a magnetic support) : in this case, a smaller 
local precision can be tolerated. 

In the preferred implementation of the transcoding 
system, two different variations of the rate control 
are provided for low bitrate applications and only one 
for high bitrate . 

The difference between the two types of rate 
control for low bit rate applications lies in how the 
local feedback is taken in account and in the 
preanalysis step. 

The two controllers can be termed "High" and "Low" 
feed-back: in both instances, the basic structure is 
comprised of global control (for the target 
calculation) , preanalysis and a local feed-back loop, 
and the parameters depend from the input and output 
bitrates . 

In the cases of a low bitrate, in the target bit 
rate calculation, a proportional control parameter is 
needed (K p ) : this constant can be< parametrized, 
depending on the input/output bit rate as follows: 

K = DestBitrate 

P SourceBitrate - DestBitrate 

This is shown in figure 14, where the value of K- 
Prop (K p ) is shown as a function of the input bitrate 
and the output bitrate. In order to enhance the 
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precision of the preanalysis (in terms of mquant 
calculated) the mquant used to find the BUP (Bit Usage 
Profile) must also be made parametrical . 

In particular, if for high bitrates a fixed value 
V can be used, for low bit rates an offset is added to 
this value. Such an offset depends again from the 
difference between the input and the output bitrate. 

At the end of the preanalysis, two different 
working conditions are present concerning the BUP. 

The former one is verified when we are in the 
condition of "high feedback", the BUP is calculated as 
explained before. When a low feedback is chosen, a new 
contribution is needed as the derivative. 

If/ the mquant value is calculated 
"proportionally", a correction must be done as follow: 

In a preferred embodiment, as derivative 
estimation, the difference between the re-quantization 
mquant value of the current macroblock and the average 
of the previous picture has been chosen. 

The derivative contribution is introduced, in 
order to delay possible abrupt variation in the local 
control, and render the control more stable. 

The value of the constant Kd is then negative, and 
it depends again on the input and output bit rates: 

Kq = K ( SourceB itrate - DestBitrate) 
DestBitrate 

The proportional constant in the local control, 
that is proportional and integrative when the control 
is tight, is very low (down to 0) : only the integrative 
contribution remains important. This fact allows a very 
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precise control of the final dimension of each- GOP, and 
the absence of proportional control prevents eventually 
fast variation of the mquant . 

The arrangement disclosed herein has been 
evaluated in terms of quality by referring to the 
scheme shown in figure 10, where source samples SS are 
fed into an MPEG-2 encoder ENCMP2 . 

The coded data bitstrearo, at a bitrate Bl was fed 
in parallel to: 

- a decoding/re- encoding chain including an MPEG-2 
decoder DECMP2 followed by another MPEG-2 encoder 
ENCMP2 ' to re-encode the samples at a lower bitrate B2 
in view of feeding to a further MPEG-2 decoder DECMP2 ' , 
and 

a downsampling transcoder DRS essentially 
corresponding to the diagram of figure 9, configured to 
transcode the video signal at the bitrate B2 followed 
by another MPEG-2 decoder DECMP2 ' ' . 

The goal of these measures is to ascertain whether 
the final quality is increased as a result of dithering 
being added to the quantization block of re- 
quantization. 

The sequences used exhibit different 
characteristics, as number of details per frame 
(Mobile&Calendar) , or global movements like panning 
(FlowerGarden) , etc. 

Two different criteria have been! used for the 
quality evaluation. 

The former is objective quality measurement, 
through the PSNR (Peak Signal Noise Ratio) index. 

The latter is subjective quality evaluation, 
watching the sequences via professional equipment (an 
image sequence processor called 'Digitale VideoSysteme ' 
and a *Barco' CVM3051 monitor). 
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The PSNR measures reported in Table. 1 confirm the 
enhancement of the quality using the dithered re- 
quantization. 

In the table below, the results obtained 
transcoding from 7 Mbit/s to 3/2/1.5 Mbit/sec are 
presented. These numbers are compared with the rate 
control with high (local proportional -integrative) and 
low (preanalisys proportional -derivative and local 
integrative) feedback.. The sequence is the Philips one, 
725 progressive PAL frames, 25 frame/sec, Dl resolution 
(720x576) down to CIF (360x288) . 

BitRate Target Low feed-back % Err. High feed-back % Err. 
1.5Mbit/s 5437500 5310112 -2.34 5255598 -2.9 

2.0Mbit/s 7250000 7114829 -1.86 7124522 -1.73 

3.0Mbit/s 10875000 1068468750 -1.75 10687411 -1.72 

Table 1: High and Low feed-back comparisons: file size in bytes 
with K IP and K IP =1.0 

It is also evident that the quality gain depends 
from the final target bitrate and from the sequence 
content : the gain becomes important when dithering can 
work well. In other words, when the original sequence 
is full of details and movements, the gain will be 
higher: in any case, the final images are never 
damaged, and in the worst case, the gain will is null. 

It is also important to underline that the quality 
gain is interesting (about 1 dB) in the middle range of 
quality (i.e. between 25 and 35 dB) where it is more 
visible; for higher quality (from 40 to 45 dB) the gain 
is less, but also its visibility cannot be high, 
because the starting quality is already very high. 
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Other tests have been performed on a different Dl 
progressive sequence, transcoding with downsampling to 
2 and 1.5Mbit/s. 

For each sequence used, the main characteristics 
were as follows: 

1. Demoiselle: PAL D 1, 720x576x25 f/s, 1000 
frames ,- 

2. Titan: PAL Dl, 720x576x25 f/s, 930 frames; 

3. Philips: PAL Dl, 720x576x25 f/s, : 700 frames; 

4. Twister: PAL Dl, 720x576x25 f/s, 1000 frames. 

The results are summarized in Table 2 below. 

File size in bytes, KIP=1.0, KPB=1.0 



Sequence 


Target 2Mbit/s 


File size 


%Brr. 


Target 1 . 5 Mbit/s 


File Size 


%Err. 


Demoiselle 


10000000 


9862370 


-1.38 


7500000 


72113S1 


-3.80 


Titan 


9320000 


9191424 


-1 .38 


7110000 . 


6932480 


-2.50 


Philip 


7080000 


6867596 


-2 .80 


5310000 


5217141 


-1.75 


Twister 


10000000 


9818110 


-1.80 


7500000 


7199840 


-4.0 



TABLE2: Low feedback rate control 



As regarding the simulation results in terms of 
PSNR (Peak Signal to Noise Ratio) , several transcoding 
bitrates have been tested: in particular from 10 to 4, 
from 7 to 4 and from 4 to 4 Mbit /second. 

This latest case is useful to check if the dither 
signal can adversely affect the transcoding process, 
when the characteristic curves of input and output are 
the same. In any case, the fact must be taken into 
account that this case cannot exist in the real system 
because under these circumstances the transcoder will 
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simply forward the input bitstream IS to the output OS, 
without any processing. 

Additional results are provided in Table 3 below. 

5 Mean PSNR (dB) (Dithered vs. Standard Re -quantization) 

7 to 4 MbitS 10 to 4 Mbit/sec 4 to 4 Mbit/sec 

Y U V Y U V Y O V 

MobileS.CalendarO.83 0.77 0.75 1.05 0.86 0.82 0.06 0.00 0.00 

Flower Garden 0.92 0.32 0.36 0.93 0.39 0.50 0.19" 0.05 0.07 

10 Brazilg 0.40 0.02 0.10 0.10 0.01 -0.09 0.00 -0.02 -0.01 

Stefan 0.68 0.46 0.55 0.59 0.48 0.55 0.00 -0.01 -0.02 

Fball 0.18 0.08 0.06 0.02 0.00 0.00 0.00 0.00 0.01 

Table 3: Mean PSNR gain in Decibel (Dithered vs. Standard re- 
15 quantization) 
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Table 1 shows that the luminance component 
never damaged (positive numbers means a gain of the 
dithered approach with respect to the traditional one) . 

Concerning the chrominance components (U and V) in 
some special conditions (e.g. when the sequence is not 
rich of details) very small degradation may occur: this 
is not visible and does not change the general 
behaviour of the system. 

In the worst case (transcoding to the same output 
bitrate as the input one) there are not evident losses 
of quality: so using the dithering also in this 
condition does not introduce loss of quality, with 
respect to standard re - quant izat ion . In very smoothed 
and uniform sequences, like Brazilg) or sequences 
exhibiting frequent scene cuts and movements changes 
(like Fball) , the gain is smaller than in the other 
cases. For very detailed sequences like 
Mobile&Calendar, instead, the average gain can reach up 
35 to 1 dB. 
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Analysis of scattergrams for luminance and 
chrominance are shows that the dithered approach is 
better in the range of quality between 25 and 35 dB, 
where the advantageous effects are clearly detectable. 

Essentially, the arrangement disclosed herein 
enhances the quality achievable in a system for 
transcoding multimedia streams without introducing 
complexity. Re -quantization is very easy to implement, 
and lead to better final quality, without any drawback. 

A gain in quality is thus achieved, without 
introducing complexity in the systems. This is a 
significant point as video transcoding techniques are 
becoming more and more important for a broad range of 
applications in the consumer electronics field: this 
particular approach can be easily applied, enhancing 
performance of the transcoding system. 

Of course, the underlying principle of the 
invention remaining the same, the details and 
embodiments may vary, also significantly, with respect 
to what has been described and shown by way of example 
only, without departing from the scope of the invention 
as defined by the annexed claims. 
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CLAIMS 

1. A method of converting digital signals between 
a first (IS) and second (OS) format, the method 
including the step of generating coefficients (X n ) 
representative of such digital signals and the step of 
subjecting such coefficients to quantization (q) , 
characterized in that it includes the steps of: 

- generating a dither signal (W n ) , and 

adding said dither signal (W n ) to said 
coefficients (X n ) before said quantization (q) to 
generate a quantized signal • 

2. The method of claim 1, characterized in that it 
said quantization step is a uniform quantization step 
(q) . 

3. The method of claims 1 or 2, characterized in 
that it includes the steps of : 

- subjecting such quantized^ signal to inverse 
quant i za t ion , and 

- subtracting said dither signal (W n ) from said 
signal subjected to inverse quantization. 

4. The method of any of the previous claims, 
characterized in that it includes the steps of: 

- subjecting each said coefficient (X n ) to a first 
quantization step (ql) in the absence of any said 
dither signal (W n ) being added to generate an 
undithered quantized coefficient, 

checking if said undithered quantized 
coefficient is equal to zero, and 

- when said undithered quantized coefficient is 
equal to zero, taking said undithered quantization 
coefficient as said quantized signal, and 

- when said undithered quantized coefficient is 
different from zero, adding said dither signal (W n ) to 



43 



said coefficient (X n ) and subjecting, said dithered 
coefficient to a quantization step (q2) to generate 
said quantized signal. 

5. The method of any of the previous claims, 
characterized in that the spectrum of said dither 
signal (W) is selected from the group consisting of : 
gaussian, uniform, sinusoidal and triangular. 

6. The method of claim 5, characterized in that 
said dither signal (W n ) is generated as a pseudo-random 
variable having a uniform distribution by subsequently 
modifying said distribution to at least one 
distribution of said group. 

7. The method of any of claims 1, 5 or 6, 
characterized in that said dither signal is generated 
from a plurality of independent pseudo-random 
variables. 

8. The method of any of the previous claims, 
characterized in that it includes the step of 
subjecting said digital signals to a discrete cosine 
transform (DCT) to generate said coefficients to be 
quantized as DCT coefficients. 

9. The method of any of the previous claims, 
characterized in that said quantization is a part of a 
transcoding process between an input stream (is) of 
digital signals at a first bitrate (Bl) and an output 
stream of digital signals (OS) at a second bitrate 
(B2), said second bitrate (B2) of said output stream 
COS) of digital signals being selectively controlled. , 

10. The method of claim 9, characterized in that 
said input stream (is) is subject to a preanalysis 
process (232, 234) including: 

- quantizing said signals with a given quantization 
step (mquant), and 

- evaluating the number of bits spent for coding 
said coefficients, and in that said bitrate (B2) 
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of said output data stream (OS) is controlled as a 
function of said preanalysis. 

11. The method of claim 10, characterized in that 
said control is of a proportional -integrative (PI) 
type . 

12. The method of either of claims 10 or 11, 
characterized in that said input stream (IS) is stream 
of digital video signals including pictures arranged in 
groups of pictures (GOP) , and in that said bitrate 
control assign value of target bits for each single 
picture of a group of pictures (GOP) . 

13. The method of any of the previous claims, 
characterized in that said quantization step (2 to 4) 
is a part of a transcoding process between an input 
stream of digital signals (IS) at a first bitrate (Bl) 
and an output bitrate (OS) at a second bitrate (B2) , 
said transcoding process including subjecting at least 
part of said input digital signals to a low pass 
filtering step (218) followed by a downsampling step 
(220) . 

14. The method of claim 10 and claim 13, 
characterized in that said low pass filtering (218) is 
performed before said preanalysis. 

15. The method of claim 13 , characterized in that 
together with said low-pass filtering (218) a 
decimation step is executed. 

16. The method of any of the previous claims, 
characterized in that said digital signals are, in at 
least one of said first and second formats, MPEG 
encoded signals. 

17. A system for converting digital signals 
between a first (IS) and second (OS) format, the system 
being configured (18) for generating coefficients (X n ) 
representative of such digital signals and including at 
least one quantizer (20; q;.ql; q2) for subjecting such 
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coefficients to quantization, characterized in that it 
includes : 

- a source of a dither signal (W n ) , and 

- an adder for adding said dither signal (W n ) to 
said coefficients (X n ) before said quantization (q) to 
generate a quantized signal . 

18. The system of claim 17 characterized in that 
it said quantizer (2 0; q; qi ; q2 ) is a uniform 
quantizer (q) . 

19. The system of either of claims 17 or' 18, 
characterized in that it includes: 

- an inverse quantizer for subjecting such 
quantized signal to inverse quantization, and 

- a subtractor for subtracting said dither signal 
15 (W n ) from said signal subjected to inverse 

quantization. 

20. The system of any of the previous claims 17 to 
19, characterized in that it includes: 

- a first quantizer (ql) for subjecting each said 
coefficient (X n ) to a first quantization step in the 
absence of any said dither signal (w n ) being added to 
generate an undithered quantized coefficient, 

- a control module (100) for checking if said 
undithered quantized coefficient is equal to zero, 

25 " an output element (102) for taking said 

undithered quantization coefficient as said quantized 
signal when said undithered quantized coefficient is 
equal to zero, and 

- an adder (104) for adding said dither signal 
30 (W n ) to said coefficient (X n ) when said undithered 

quantized coefficient is different from zero, and a 
second quantizer (q2) for subjecting said dithered 
coefficient to a quantization step to generate said 
quantized signal for feeding to said output element 
35 (102) . 
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21. The system of any of the previous claims 17 to 
2 0 , characterized in that said source of said dither 
signal (W) is a source of signal having a distribution 
selected from the group consisting of: gaussian, 
uniform, sinusoidal and triangular. 

22. The system of claim 21, characterized in that 
said source is a source of a pseudo- random variable 
having a uniform distribution having associated a 
distribution modifier element for modifying said 
distribution to at least one distribution of said 
group . 

23. The system of any of claims 17, 21 or 22, 
characterized in that said source of dither signal 
includes a plurality of sources of independent pseudo- 
random variables. 

24. The system of any of the previous claims, 
characterized in that it includes a DCT transform 
module (18) for subjecting said digital' signals to a 
discrete cosine transform (DCT) tq> generate said 
coefficients to be quantized as DCT coefficients. 

25. The system of any of the previous claims 17 to 
24, as a part of a transcoder for transcoding an input 
stream (IS) of digital signals at a first bitrate (Bl) 
into an output stream (OS) of digital signals at a 
second bitrate (B2), including a bitrate control block 
(234) for selectively controlling said second bitrate 
(B2) of said output stream (OS) of digital signals. 

26. The system of claim 25 , characterized in that 
it includes a preanalysis chain (224, 232, 234) for 
subjecting said input stream (IS) to a preanalysis 
process (232, 234), said chain including: 

- a quantizer (224) for quantizing said signals with 
a given quantization step (mquant) , and 
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- a bit usage profile module (232); for evaluating 
the number of bits spent for coding said 
coefficients , 

- and in that said bitrate control block (234) is 
configured for controlling the bitrate (B2) of said 
output data stream (OS) as a function of said 
preanalysis . 

27. The system of claim 26, characterized in that 
said bitrate control block (234) includes a 
proportional -integrative (PI) controller. 

28. The system of either of claims 2 6 or 27, for 
use in connection with an input stream (IS) of digital 
video signals including pictures arranged in groups of 
pictures (GOP), characterized in that said bitrate 
control block (234) is configured for assigning said 
value of target bits for each single picture of a group 
of pictures (GOP) . 

29. J The system of any of the previous claims, 
characterized in that said quantizer (224) is a part of 
a transcoder adapted for transcoding an input stream of 
digital signals (IS) at a first bitrate (Bl) into an 
output bitrate (OS) at a second bitrate (B2) , said 
transcoder including a low pass filter (218) followed 
by a downsampling module (220) for subjecting at least 

25 part of said input digital signals to lowpass filtering 
and downsampling 

30. The method of claim 2 6 and claim 29, 
characterized in that said low pass filter (218) is 
arranged upstream of said preanalysis chain (224, 232, 

30 234). 

31. The system of claim 29, characterized in that 
a decimation module is associated with said low-pass 
filter (218) . 

32. A computer program product directly loadable 
35 in the internal memory of a digital computer and 
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including software code portions for performing the 
method of any of claims 1 to 16 when the product is run 
on a computer. 
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ABSTRACT 

Digital signals are converted between a first (IS) 
and second (OS) format by a conversion process 
including the step of generating coefficients (X n 
representing such digital signals. Such coefficients 
may be e.g. Discrete Cosine Transform (DCT) coefficient 
generated during encoding/transcoding of MPEG signal . 
The coefficients are subject to quantization (q) by 
generating a dither signal (W n ) that is added to the 
coefficients (X n ) before quantization (q) to generate a 
quantized signal. Preferably, each coefficient (X n ) is 
first subject to a first quantization step (ql) in the 
absence of any dither signal (W n ) added to generate an 
undithered quantized coefficient. If ; the undithered 
quantized signal is equal to zero the undithered 
quantized coefficient is taken as the output quantized 
signal. If the undithered quantized coefficient is 
different from zero, the dither signal (W n ) is added 
and the dithered coefficient thus obtained is subject 
to a quantization step (q2) to generate the output 
quantized signal . 

(Figure 7) 
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